Preparatory splitstream format changes for ostree support #185

alexlarsson · 2025-09-29T09:04:18Z

This changes the splitstream format a bit, with the goal of allowing splitstreams to support ostree files as well (see #144), but it it imho a generally nice change.

The primary differences are:

The header is not compressed
All referenced fs-verity objects are stored in the header, including external chunks, mapped splitstreams and (a new feature) references that are not used in chunks.
The mapping table is separate from the reference table (and generally smaller), and indexes into it.
There is a magic value to detect the file format.
There is a magic content type to detect the type wrapped in the stream.
We store a tag for what ObjectID format is used
The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not part of the splitstream "content" will be useful for the ostree support to reference file content objects.

This change also allows more efficient GC enumeration, because we don't have to parse the entire splitstream to find the referenced objects.

cgwalters · 2025-09-29T12:29:31Z

crates/composefs-oci/src/skopeo.rs


 use crate::{sha256_from_descriptor, sha256_from_digest, tar::split_async, ContentAndVerity};

+pub const TAR_LAYER_CONTENT_TYPE: u64 = 0x2a037edfcae1ffea;


Can you add a comment for where these came from? I'm guessing random? If so just add a comment // Random unique ID ?

That said I wonder if it wouldn't be nicer to store (variable length) strings for this in the format? Maybe it could go all the way to literally suggested to be the mediaType from OCI (if applicable)

Yes. These are random.

But, I'd rather avoid having variable length things in the header. That makes parsing it much more tricky.

We could make it a real uuid tho

I'm fine keeping as u64 too.

I added some comments here

cgwalters · 2025-09-29T12:31:27Z

crates/composefs/src/repository.rs

+    pub fn has_named_stream(&self, name: &str) -> bool {
+        let stream_path = format!("streams/refs/{}", name);
+
+        readlinkat(&self.repository, &stream_path, []).is_ok()


I don't like "swallowing" errors like this, I'd say call stat instead and require it's S_IFLNK

I redid this with stat()

cgwalters · 2025-09-29T12:53:54Z

crates/composefs/src/splitstream.rs

+#[derive(Clone, Debug, FromBytes, Immutable, IntoBytes, KnownLayout)]
+#[repr(C)]
+pub struct MappingEntry {
+    pub body: Sha256Digest,


If we're changing the format here...I think it'd be nice to make this one extensible.

However...bigger picture there's another consideration: There's obviously a metric ton of binary serialization formats out there. A custom one isn't wrong necessarily but...how about say CBOR ? It has some usage and a proper RFC etc.

I guess a dividing line is "do we care about mmap()"? Probably not?

splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.

That said, I agree that we should make it at least a bit extensible. This MR adds a magic header, but also adding a version field and a few bytes of unused/unparsed space does seem quite useful.

We discussed mmap, and the end result was, no, we don't want it.

splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.

Yeah, though the nice thing about Rust here is that for stuff like this there's a lot of well-done crates.

It also makes it a lot more obvious and easy to parse from other languages too if we can say "it's just CBOR" (or whatever).

Anyways: I'm basically fine with this as is too.

Although what about the algorithm agility? There's been some thoughts that for post-quantum crypto we may need to get away from sha256 in theory as far as I understand things.

I think the right thing is to add a header size field, and skip parts we don't understand. Then we can easily extend this later.

I added a new extension_size field which we skip on read.

This changes the splitstream format a bit, with the goal of allowing splitstreams to support ostree files as well (see containers#144) The primary differences are: * The header is not compressed * All referenced fs-verity objects are stored in the header, including external chunks, mapped splitstreams and (a new feature) references that are not used in chunks. * The mapping table is separate from the reference table (and generally smaller), and indexes into it. * There is a magic value to detect the file format. * There is a magic content type to detect the type wrapped in the stream. * We store a tag for what ObjectID format is used * The total size of the stream is stored in the header. The ability to reference file objects in the repo even if they are not part of the splitstream "content" will be useful for the ostree support to reference file content objects. This change also allows more efficient GC enumeration, because we don't have to parse the entire splitstream to find the referenced objects. Signed-off-by: Alexander Larsson <alexl@redhat.com>

allisonkarlitskaya

Let's talk about this. I might have a bit of bandwidth to work on this if you like.

allisonkarlitskaya · 2025-10-07T06:30:26Z

doc/splitstream.md

+
+struct SplitstreamHeader {
+    magic: [u8; 7], // Contains SPLITSTREAM_MAGIC
+    algorithm: u8,  // The fs-verity algorithm used, 1 == sha256, 2 == sha512


We should also do fs-verity block size here. That's usually expressed as a bit-shift count, so 12 or 16...

We could also write it like "fsverity-sha256-12" or so as a string... some relevant discussion in #181.

allisonkarlitskaya · 2025-10-07T06:32:00Z

doc/splitstream.md

+    n_refs: u64,
+    n_mappings: u64,
+    refs: [ObjectID; n_refs]    // sorted
+    mappings: [MappingEntry; n_mappings] // sorted by body


It's so sketch that we hardcode sha256 here... I think that's probably OK, but maybe we'd add an extension mechanism so we could add new types of mappings tables...

Something like this from the start:

magic

n_sections

n_sections * (

section_start

section_size_in_bytes

)

We could name the sections but I think it's quite OK to just know what the numbers mean and require that they're all present, in order. An empty section would be denoted by a zero size.

We could also get into compat vs. uncompat extensions... not sure how far it's worth going here...

cgwalters · 2025-10-30T12:42:34Z

We should get this in probably very soon and then I think declare the format stable?

allisonkarlitskaya · 2025-10-30T12:59:48Z

We should get this in probably very soon and then I think declare the format stable?

I have a branch..... lol

allisonkarlitskaya · 2025-10-30T13:07:48Z

One of the things that I'm tormenting myself on a bit right now is the sha256 mapping. I'm considering changing it to a general-purpose "named object reference" mapping: we could then have a hashmap mapping names like "sha256:12345" to the object ID in question, and adding sha512 would be seamless. I think I'd chose to encode that as a nul-separated series of strings of the form 0:name0, 1:name1, etc. and compress the whole thing.

The alternative is to stay with what we have now more or less, but it's much less flexible and is gonna be cruft one day, I'm sure. That being said: we have an extensibility mechanism now, at least...

The other thing that really needs fixing in @alexlarsson's work vs. the current version of the branch is that we should really take advantage of the fact that we have the references array out front now and use indexes into it from the splitstream content instead of repeating the whole object ID. It would have the additional advantage of ensuring that it became physically impossible to refer to an object that wasn't listed in the "depends" header (which we'll use during GC) and yet another advantage in that we could use the 64bit "starting word" for each internal/external section in the stream for both cases:

positive: this many internal bytes
negative or zero: an index into the references array

(or with the high bit as a flag or whatever).

It's just "work" to get this over the line....

After that, I think we need to figure out a way to kill off the content-sha256-based naming of splitstreams and perhaps even consider getting rid of the streams/ directory entirely... each backend would have its own way of 'caching' interesting splitstream objects for itself. I'm not entirely sure how I'd do that for the OCI backend. In a related conversation with @Johan-Liebert1 we discussed having the layers (and possibly the config) ((and possibly possibly the manifest some day)) referenced from the erofs image itself as a way to optionally prevent those things from being GC'd. We'd probably want some sort of a better "lookup table" still, though... but the key difference is that this table would be unique to OCI, not some global "streams" directory that we try to pretend is sharable by everyone on equal footing...

alexlarsson force-pushed the splitstream-new-format branch from 3803dc3 to 9aedd96 Compare September 29, 2025 09:26

cgwalters reviewed Sep 29, 2025

View reviewed changes

alexlarsson force-pushed the splitstream-new-format branch from 9aedd96 to 057121b Compare October 6, 2025 14:47

alexlarsson force-pushed the splitstream-new-format branch from 057121b to bed66dc Compare October 6, 2025 15:19

allisonkarlitskaya requested changes Oct 7, 2025

View reviewed changes


		use crate::{sha256_from_descriptor, sha256_from_digest, tar::split_async, ContentAndVerity};

		pub const TAR_LAYER_CONTENT_TYPE: u64 = 0x2a037edfcae1ffea;

Preparatory splitstream format changes for ostree support #185

Are you sure you want to change the base?

Preparatory splitstream format changes for ostree support #185

Uh oh!

Conversation

alexlarsson commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allisonkarlitskaya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Oct 30, 2025

Uh oh!

allisonkarlitskaya commented Oct 30, 2025

Uh oh!

allisonkarlitskaya commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants